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This article proposes a spatial dynamic structural equation model for the 
analysis of housing prices at the State level in the USA. The study contributes 
to the existing literature by extending the use of dynamic factor models to 
the econometric analysis of multivariate lattice data. One of the main ad- 
vantages of our model formulation is that by modeling the spatial variation 
via spatially structured factor loadings, we entertain the possibility of iden- 
tifying similarity "regions" that share common time series components. The 
factor loadings are modeled as conditionally independent multivariate Gaus- 
sian Markov Random Fields while the common components are modeled by 
latent dynamic factors. The general model is proposed in a state-space formu- 
lation where both stationary and nonstationary autoregressive distributed-lag 
processes for the latent factors are considered. For the latent factors which 
exhibit a common trend, and hence are cointegrated, an error correction spec- 
ification of the (vector) autoregressive distributed-lag process is proposed. 
Full probabilistic inference for the model parameters is facilitated by adapt- 
ing standard Markov chain Monte Carlo (MCMC) algorithms for dynamic 
linear models to our model formulation. The fit of the model is discussed for 
a data set of 48 States for which we model the relationship between housing 
prices and the macroeconomy, using state level unemployment and per capita 
personal income. 



1. Introduction. This paper is concerned with the modeling of housing prices 
at the State level in US. Housing is a massive factor in people's consumption. For 
industrialized nations, for example, it is the biggest component in the basket of 
goods used for calculating the consumer price index. Also, the Bureau of Labor 
Statistics has estimated in 2010 that about 24 percent of the total consumption of 
American home owners goes toward housing. Hence, housing is big enough to 
leave a sizable footprint on the economy in general. 

In the generic sense, housing is also an important social institution in our soci- 
ety. Not only does housing play a major role in any nation's economy, but it also 
provides people with the social values of shelter, security, independence, privacy 
and amenity. The state of the cuiTcnt economy and recent events in the housing 
sector have thus lead to increased attention on the role of the housing sector in the 
economy as a whole. 
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Economists have studied the relationship between the housing sector and the 
macroeconomy since the 1970s. Several socio-economic variables and/or real es- 
tate characteristics are traditionally considered to have an impact on housing prices 
and several studies have thus been dedicated to the determination of fundamental 
factors explaining US housing price variations. Our primary purpose here is not to 
comprehensively examine all these variables. In fact, there is no single generally 
agreed upon set of variables used in testing models of housing prices in the liter- 
ature. For a complete discussion on this point see, for example, Malpezzi (1999), 
Capozza et al. (2002) and Gallin (2008). It is thus beyond the scope of this paper 
to discuss the possible roles played by all fundamental factors in explaining the 
variation of housing prices. Hence, for simplicity, we only examine here the extent 
to which these prices are driven by the real per capita disposable income and the 
unemployment rate. 

1.1. The data: a brief description. The data analyzed in this paper are from 
the St. Louis Federal Reserve Bank database^ and the Bureau of Labor Statistics^ 
and consist of quarterly time series on 48 States (excluding Alaska and Hawaii) 
from 1984 (first quarter) to 201 1 (fourth quarter). Figure 1 shows the time series of 
the real housing price index for the 48 United States grouped in the eight Bureau 
of Economic Analysis (BEA) regions. The time series are expressed in logarithmic 
scale - see section 8 for a complete description of the data set. 

Figure 1 shows that there are interesting dynamic structures in the time series 
and that periodic patterns and common trend components are consistent features 
of the housing market. Specifically, it appears that housing prices have been rising 
rapidly. Since 1995, we have estimated that, on average, real housing prices have 
increased about 36 percent, roughly double the increase of previous housing price 
booms observed in the late 1980s. Moreover, we notice that housing prices con- 
tinued to rise strongly during the 2001 recession and that the process of housing 
price boom, which some have interpreted as a bubble, started in 1998, accelerated 
during the period 2003-2006 and burst in 2007. The prices have then been falling 
sharply overall the country. 

The possibility of modeling all these dynamic features as well as to obtain ac- 
curate housing price forecasts is important for prospective homeowners, investors, 
appraisers and other real estate market participants such as mortgage lenders and 
insurers. 

The way in which housing prices spread out to sunounding locations over time 
are also of interest in the real estate literature. The co-movements shown by the 
time series within BEA regions suggest the presence of spatial correlation. As 

'http://research.stlouisfed.org/fred2/ 
^ http : //stats .bl s . gov/cpi/home. htm#data 
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Fig 1. Time series of the log-transformed real housing price index. The 48 United States are grouped 
in the eight Bureau of Economic Analysis (BEA) regions. 

Stated in Holly, Pesaran and Yamagata (2010), it is possible that States that are con- 
tiguous may influence each other's housing prices. In fact, high prices in metropoli- 
tan areas may persuade people to commute from neighboring States. Labour mo- 
bility is quite high in the USA and lower housing prices may provide an incentive 
to migrate. Another possible source of cross-sectional dependence would be due 
to economy-wide common shocks that affect all cross section units. Changes in 
interest rates, oil prices and technology are examples of such common shocks that 
may affect housing prices, although with different degrees across States. 

To explore the existence of spatial interactions, using data on the growth of 
real housing prices. Table 1 shows the simple coiTclation coefficients between 
each State, within and between correlations for the 8 BEA regions. The diago- 
nal elements show the within region average correlation coefficients while the off- 
diagonal elements give the between region correlation coefficients. Apart from the 
States of Southeast, which are more conelated on average with the States of Great 
Lakes than among themselves, the within region correlation is larger than the be- 
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tween region correlation. In general, on average the correlations decline with dis- 
tance, but it is interesting to note the quite high correlations between the East and 
West regions, i.e. for States belonging to Mideast and Far West regions. In general, 
there is more evidence in the raw data of a possible spatial pattern in real housing 
prices than in real incomes and unemployment rate. 

Table 1 

Average of correlation coefficients within and between regions first difference log of real 
housing prices. BEA regions: New England (NE), Mideast (ME), Great Lakes (GL), 
Plains (PL), Southeast (SE), Southwest (SW), Rocky Mountain (RM), Far West (FW). 
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0.50 










SE 


0.35 


0.40 


0.48 
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0.10 


0.17 


0.30 


0.46 


0.37 


0.48 


0.50 




FW 


0.33 


0.46 


0.42 


0.34 


0.37 


0.40 


0.41 


0.50 



1.2. Related literature and the proposed model. Modeling the spatio-temporal 
variability of housing prices has enjoyed widespread popularity in the last years. 
In order to obtain a high degree of accuracy in the results, the analysis of housing 
prices across US States requires the definition of a general and flexible economet- 
ric model where the temporal and cross-sectional dependencies must be accommo- 
dated. Several efforts have been made to develop spatiotemporal models but there 
is no single approach which can be considered uniformly as being the most ap- 
propriate. For example, time series models have become increasingly sophisticated 
in their treatment of dynamics and trends over time, including the application of 
unit roots and cointegration techniques (Giussani and Hadjimatheou, 1991; Meen, 
2001; Muellbauer and Murphy, 1997). However, traditional approaches, such as 
those based on standard vector autoregression analysis (VAR), do not allow for a 
direct modeling of locational spillovers and are thus not consistent with the "rip- 
ple effect" theory (Meen, 1999). A spatial adaptation of VARs, denoted as SpVAR 
models, explicitly considers the potential impacts of economic events in neigh- 
boring States and has been discussed in Kuethe and Pede (2011). The SpVAR 
is a specific version of the Spatio-Temporal Auto-Regressive Moving Average - 
STARMA - model introduced by Pfeifer and Deutsch (1980) where the linear de- 
pendencies are lagged in both space and time. Since STARMAs are an extension 
of the ARMA class of models (Box, Jenkins and Reinsel, 1994) they are partic- 
ularly useful to produce temporal forecasts of the variable of interest. However, 
the STARMA specification also suffers from some disadvantages. First, because 
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of the amount of computational effort required, STARMAs are in general only 
suitable for modeling data which are dense in time and sparse in space. For exam- 
ple, in Kuethe and Pede (2011) the analysis is only limited to 11 States (i.e. West 
Region). Secondly, the understanding of co-movements among US State housing 
prices (and other involved variables) is difficult when the number of the States is 
large. Knowledge of this covariation is required both to academics seeking to ex- 
plain the economic nature and sources of variation and to practitioners involved 
in the development of trading strategies. Thirdly, as argued by Anselin (1988, pp. 
11-14), the STARMA class does not offer a fully adequate modeling of the spatial 
dependence and heterogeneity of observations. The lack of an adequate treatment 
of a simultaneous (instantaneous) spatial dependence is also the main point of crit- 
icism raised by Cressie (1993, p. 450) to the STARMA methodology. In fact, in 
its standai^d specification, STARMA implicitly assumes that, conditional on past 
observations, the process is uncorrected across space. This is undoubtedly a ma- 
jor shortcoming, since many observed series, as noted for example by Pfeifer and 
Deutsch (1981), show considerable contemporaneous coixelation even after condi- 
tioning on the past history of the process. When the contemporaneous coixelation 
is considered by the model, the observations become a nonlinear transformation 
of the innovations and, as a result, maximum likelihood estimation becomes much 
more difficult (Elhorst, 2001; Di Giacinto et al., 2005). 

Seemingly Unrelated Regression (SUR) and error correction panel data models 
(see for example, Meen, 2001; Cameron, Muellbauer and Murphy, 2006) have also 
been largely used with spatial and time effects to investigate the evolution of hous- 
ing prices. Apart from their rather complex structure, as STARMAs, these models 
are not suitable when the number of regions is relatively large. In fact, the applica- 
tion of an unrestricted SURE-GLS approach to lai^ge (cross section dimension) 
and T (time series dimension) panels involves nuisance parameters that increase 
at a quadratic rate as the cross section dimension of the panel is allowed to rise 
(Pesaran, 2006). 

Recent research has found that in a data rich environment, dimension reduction 
in the form of factors is useful for exploratory analysis, prediction and policy analy- 
sis. Factor analysis assumes that the cross dependence can be characterized by a fi- 
nite number of unobserved common factors, possibly due to economy-wide shocks 
that affect all States, albeit with different intensities. Thus, strong co-movement 
and high correlation among the series suggest that both observable and unobserv- 
able factors must be at place. The effects of common shocks on housing prices 
have been taken in consideration in van Dijk et al. (2011) and Holly et al. (2010) 
by making use of the common correlated effects estimator (CCE, Pesaran, 2006) 
which controls for heterogeneity and spatial dependence. In these studies, the au- 
thors develop a panel data model where fixed mean effects, cointegration, cross- 
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equation correlations and latent factors are considered. Furthermore, they show that 
by approximating the linear combinations of the unobserved factors by cross sec- 
tion averages of the dependent and explanatory variables, and by running standard 
panel regressions augmented with these cross section averages, spatial dependency 
can be eliminated. 

Differently from these authors, we approach the analysis from the perspective of 
recent developments of dynamic factor models in the literature of spatio-temporal 
processes. We assume that the observed process can be modeled by a temporally 
dynamic and spatially descriptive model, hereafter referred to as spatial dynamic 
structural equation model - SD-SEM. There are some important differences be- 
tween our approach and the one discussed by Holly et al. (2010) and van Dijk et al. 
(2011). Firstly, differently from these authors, we do not use cross section averages 
to eliminate cross-sectional dependencies. Instead, our model fomiulation exploits 
the spatio-temporal nature of the data and explicitly defines a non-separable spatio- 
temporal covariance structure of the multivariate process . Secondly, because of the 
high dimensionality of the data, dimension reduction is important and we suggest 
modeling the temporal relationship between dependent and regressor variables in 
a latent space. The observed processes are thus described by a potentially small set 
of common dynamic latent factors. For all possible model candidates which may 
be specified, we use a multivariate autoregressive distributed-lag specification for 
these latent processes and, to account for situations in which two or more latent 
factors appear to exhibit a common trend, their cointegrating relationship is con- 
sidered. Thirdly, by modeling the spatial variation via spatially structured factor 
loadings, we entertain the possibility of identifying clusters of States that share 
common time series components. This is one of the main advantages of our model 
formulation. Lastly, the model naturally allows for producing temporal and spatial 
predictions of the variables of interest. Note that although spatial interpolation is 
not a main task in lattice data applications, it may be an important issue in terms 
of missing data reconstruction (i.e. partial or total reconstruction of the housing 
price time series). This problem would not be easily addressed by the other model 
formulations discussed above. 

The SD-SEM represents a multivariate extension of the model recently pro- 
posed by Ippoliti, Gamerman and Valentini (2012) for modeling environmental 
coupled (correlated) spatio-temporal processes. Our spatio-temporal data are thus 
multivariate, in that more than one variable is typically measured at specific spatial 
sites (States) and different temporal instants. Furthermore, as in Lopes, Salazar and 
Gamerman (2008) and Ippoliti et al. (2012), we assume that the spatial dependence 
can be modeled through the columns of the factor loading matrices. However, dif- 
ferently from these authors, who refer to applications with spatially continuous 
(i.e. geostatistical) processes, we consider here applications with lattice data such 
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that the factor loadings can be modeled as conditionally independent multivariate 
Gaussian Markov Random Fields - GMRFs. While models for multivariate geosta- 
tistical data have been extensively explored, models for lattice data have received 
less attention in literature. For recent methodological developments the reader is 
referred to Sain and Cressie (2007), Sain, Furrer and Cressie (201 1) and references 
therein. 

The SD-SEM is developed within a state-space framework and full probabilistic 
inference for the parameters is facilitated by Markov chain Monte Carlo (MCMC). 

The remainder of the paper is organized as follows. In section 2, we describe 
the general dynamic latent model while in section 3 a specific attention is given 
at models which incorporate general forms of the spatial correlations and cross- 
correlations between variables at different locations. In section 4 we describe the 
state-space formulation and in section 5 discuss the nonstationary cases for the 
temporal dynamics of the latent factors. In section 6 we consider Bayesian infer- 
ential issues and in section 7 we describe forecasting strategies. In section 8 we 
discuss fits of the model to the data set of US real housing prices while section 9 
concludes the paper. 

2. The spatial dynamic structural equation model. Often observations are 
multivariate in nature, i.e, we obtain vector responses at locations across space. For 
such data, we need to model both association between measurements at a location 
as well as association between measurements across locations. With increased col- 
lection of such multivariate spatial data, there arises the need for flexible explana- 
tory stochastic models in order to improve estimation precision (see, for example, 
Kim, Sun and Tsutakawa, 2001) and to provide simple descriptions of the complex 
relationships existing among the variables. In the following, a model formulation 
which describes the structural relations among the variables in a lower dimensional 
space is presented. 

Assume that Y and X are two multivariate (multidimensional) spatio-temporal 
processes; that is, assume that several variables are measured at the node or interior 
(State), s, of a lattice C and temporal instant t G {1,2,..., T}. Hence, for Uy 
variables, we write Y(s, t) = [Yi{s, t), . . . , Yny{s, t)]', and the same holds for X, 
for Ux variables. It is explicitly assumed that X is a predictor of Y, which is the 
process of interest. 

Also, assume that is the number of locations in C and let hy = UyN and 
nx = n^N. Then, at a specific time t, the {hy x 1) and {hx x 1) dimensional 
spatial processes, Y and X, are denoted as Y(t) = [Y(si, t)', . . . , Y(sAr, t)']' and 

X(t) = [X(si,t)',...,X(s^,i)T- 

Our model assumes that each multivariate spatial process, at a specific time t, 
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has the following linear structure 

(1) X(t) = m,it)+UJ{t) + n,{t) 

(2) Y{t) = iny{t)+Hyg{t)+Uy{t) 

where my{t) and uix{t) are (n^ x 1) and (hx x 1) mean components modeling 
the smooth large-scale temporal variability, and H^,. are measurement (factor 
loadings) matrices of dimensions [riy x m) and [hx x /), respectively, and g{t) and 
f(i) are m- and /-dimensional vectors of temporal common factors. Also, Uy{t) 
and Ux{t) are Gaussian error terms for which we assume Uy{t) ~ A'^(0, ^uy) and 
Ux{t) ~ A^(0, Su^). For simplicity, throughout the paper it is assumed that 
and are both diagonal matrices and that m ^ hy and I <^ fix- 

The temporal dynamic of the common factors is then modeled through the fol- 
lowing state equations 

P Q 

(3) g(t) = ^Qg(t-i) + J]D/(i-i)+^(t) 

i=i j=i 

s 

(4) f(t) = ^Rfcf(i-A:)+r7(t) 

k=l 

where Cj (m x m), Y)j (m x I), and {I x I) are coefficient matrices modeling 
the temporal evolution of the latent vectors g(t) = [gi{t), . . . ,gni{t)]' and f(t) = 
[fi{t), . . . , fi{t)]', respectively. Finally, ^[t] and r]{t) are independent Gaussian 
error terms for which we assume ^{t) ~ N{0, S^) and r]{t) ~ A^(0, ^n)- 

Equation (3) represents a Vector Autoregressive model with exogenous variables 
(VARX) where the variables in g(t), considered as endogenous (i.e. determined 
within the system), are controlled for the effects of other variables, f(t), consid- 
ered as exogenous (i.e. determined outside the system and treated independently 
of the other variables)^. Equations (1-4) thus provide the basic formulation of the 
SD-SEM. One advantage of this model is that temporal forecasts of the variable of 
interest, Y, can be obtained by modeling the dynamics of a few common factors. 
Also, the model is spatially descriptive in that it can be used to identify possible 
clusters of locations whose temporal behavior is primarily described by a poten- 
tially small set of common dynamic latent factors. As it will be shown in the next 
section, flexible and spatially structured prior information regarding such clusters 
can be specified through the columns of the factor loading matrix. 



^The distinction between exogenous and endogenous variables in a model is subtle and is a subject 
of a long debate in the literature. See, for example, Engle, Hendry and Richard (1983), Osiewalski 
and Steel (1996). Gourieroux and Monfort (1997, Chapter 10) also provide a clear distinction be- 
tween the different exogeneity concepts. 
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3. Factor loadings and multivariate GMRFs. A key property of much spatio- 
temporal data is that observations at nearby sites and times will tend to be similar 
to one another. This underlying smoothness characteristic of a space-time process 
can be captured by estimating the state process and filtering out the measurement 
noise. It is customary for dynamic latent models to refer to the unobserved (state) 
processes as the common factors and to refer to the coefficients that link the fac- 
tors with the observed series as the factor loadings. It is assumed that these factor 
loadings have the nature of spatial processes and, extending results in Ippoliti et 
al. (2012), here the spatial dependence is modeled through a multivariate GMRF. 
Relevant papers useful for our purposes are Mardia (1988) and Sain and Cressie 
(2007), and we refer to them for known results on the model formulation. 

Let h^^ = [h^. (si)', ha;^ (S2)', . . . , h^^. {sn)']', i.e. the j-th column of H^;, be 
a -dimensional spatial process observed on C - and similarly for Hy. Also, let 
[hx^ (sj) |i?„j] denote the conditional distribution of h^^ (sj) given the rest (i.e. val- 
ues at all other sites). Then, the GMRF is defined by the conditional mean 

(5) E {hxM)\R-^) = + E - /-^O 

and the conditional covariance matrix 



(6) 



Var(h^^(si)|i?_») 



where Si is a finite subset of C containing neighbors of site Sj, /x^ ^ is a n^;- 

dimensional mean vector, and F^^^""^^ is a (n^; x n^) matrix of spatial regression 
parameters. 

To take into account the effect of some explanatory variables, it is possible to 
parameterize the mean vector thi^ough the definition of a (A^ x q) design matrix. 



T>*, such that n 



X>*(3^^''j\ with fS^^^j' a (g X 1) vector of pai^ameters. 



Assuming Cj is a vector of covariates for the i-th location, we have /i 



, with • ^ 



:) 



(ha, )' 

1 t-'rix 



® c^), i = 1, . . . ,n, and (g) denoting the Kronecker product. For a dis- 
cussion of different specifications of the matrix X>*, see for example Ippoliti et al. 
(2012) and Lopes et al. (2008). However, due to the static behavior of h^; , only 
spatially-varying covariates will be considered in explaining the mean level of the 
GMRF. 

With the definition of the conditional distributions, it follows (see Mardia, 1988) 



that the joint distribution of is MVN(^fj, 



) with the covariance ma- 
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trix specified as S^'^'^i-* 



block 



i ill 



, where Fj 



-I and 



for a generic matrix G, block[Giu] denotes a block matrix with the (i, ii)th block 
given by Gj^ (see Sain and Cressie, 2007). To guarantee that a proper probability 
density function is defined, the parametrization must ensure that S^^^'i^ is positive- 

definite and symmetric; hence, we require both F^^ ^ T„ ^ = T ■ ^ F ■ ^ and 



block 



lU 



positive definite. 



4. The state space formulation. As shown in section 2 the temporal dynamic 
is modeled through the state equations (3) and (4). The specification of equation 
(4) is necessary to predict in time the latent process f (t) and thus to obtain fc— step 
ahead forecasts of g(t) through equation (3). It is thus useful to specify the joint 
generation process for g(t) and f (t) as 



(V) 



m 



Ci Di 
Ri 



g(i-l) 
f - 1) 



+. . .+ 



Cp Dp 
R„ 



g(t - p) 
f{t-p) 



+ 



where it is assumed without loss of generality that p > max{s, q), Dj = for 
i > q and Rj = for j > s. It follows that the joint generation process of g(t) 
and f (t) is a VAR(p) process of the type 



(8) 

where 



d(t) 



*id(t - 1) + . . . + ^pd{t -p) + e{t) 



sit) 

m 



R, 



e{t) 



m 



The presence of the measurement and the state variables naturally leads to the 
state-space representation (Lutkepohl, 2005) of the SD-SEM model; given the data, 
this representation allows for a recursive estimate of the latent variables through the 
Kalman filter algorithm. The hnear Gaussian state-space model is thus described 
by the following state and measurement equations 



(9) 
(10) 



Ci{t) = 1) + 3C(t) 

z(t) = Ha(t) + u(t) 
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where a{t) is the state vector, $ is the nonsingular transition matrix, 3 is a con- 
stant input matrix, z(t) is the measurement vector and H is the measurement ma- 
trix. The sequences (^{t) and u(t) are assumed to be mutually independent zero 
mean Gaussian random variables with covariances £'{(^(tj)(^(tj)'} = "ifdij and 
i?{u(tj)u(ij)'} = "SuSij, where E{-} denotes the expectation and 6ij the Kro- 
necker delta function. In (9) and (10) we have the following specification 





- d{t) 1 




$1 *2 *p 




■ e{t) ■ 




d{t - 1) 
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a{t) = 






,at) = 






_d{t-p+l) _ 




••• I 








z(i) 



yit) 

x(t) 
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Hy 
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Uy{t) 



5. Nonstationary latent factors. The dynamic specification for the state vec- 
tor a{t) is quite general. In fact, the family of time series processes that can be 
fonnulated as in equations (9) and (10) is wide and includes a broad range of 
nonstationary time series processes. Sometimes it may be advantageous to have 
a specification that decomposes the latent factors into stationary and nonstationary 
components, such as trend, periodic or cyclical components. The large scale dy- 
namic components can in fact be directly specified through the common dynamic 
factors. In this case, for example, common seasonal factors can receive different 
weights for different columns of the factor loading matrix, so allowing different 
seasonal patterns for the spatial locations. For some specific examples, and for a 
wider discussion on this point, see Lopes et al. (2008) and Ippoliti et al. (2012). 

5.1. Cointegrated latent factors. Nonstationaiity can also occur when two or 
more latent factors appear to exhibit a common trend, and hence are cointegrated 
(Johansen, 1988). In this case we have that one or more linear combinations of 
these factors are stationary even though individually they are not. If the factors are 
cointegrated, they cannot move too fai" away from each other and we should ob- 
sei^ve a stable longrun relationship among their levels. In contrast, a lack of cointe- 
gration suggests that such factors have no long-run link and, in principle, they can 
wander arbitrarily far away from each other. 

In our model formulation we consider the case in which the exogeneous factors 
ai^e cointegrated among themselves as well as with the endogenous latent variables. 
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In this case the vector autoregressive process of equation (8) can be written in the 
error correction model (ECM) form as 

p-i 

(11) Ad(t) = Ad(t-l)+^*,Ad(t-i) + e(t) 

1=1 

where A = — I + X^f^j^ ^i, = — X]j=j+i ^i' ^^'^ ^ the difference operator, 
i.e. Ad(t) = d{t) — d{t — l). Full details of the vector eixor coiTcction specification 
of equation (1 1) are provided in Appendix A where we also show that the matrix of 
long-run multipliers, A, is an upper block triangular matrix. These single blocks, 
expressed as a product of parameter matrices, provide information about: i) the 
cointegration structure within the exogenous and endogenous processes i{t) and 
g(t), and ii) the cointegration between the two processes. 

6. Inference and computations. 

6.1. Prior information. Full probabihstic inference for the model parameters 
is carried out based on the following independent prior distributions. Throughout 
we shall use vec{-) to denote the vec operator and G{a, b) to denote the Gamma 
distribution with mean a/b and variance a/b'^. Unless explicitly needed, full speci- 
fications of the priors are only given for X so that definitions for Y follow accord- 
ingly. 

Measurement equation 

The precision matrix is assumed to be diagonal where each element has a 
Gamma prior distribution, G(0.01, 0.01). 

The prior distribution for (B^^'^i^ (i = !,...,/) is A^(0,(j|l). Then, assum- 
ing a constant conditional covariance matrix, the prior on the inverse covariance 
matrix T^'^^^i) is given by the Wishart distribution (Mardia, Kent and Bibby, 
1979), that is T^''-^)"' ~ (&•, (feS^)^^), where g.^^ > I and S^. is a pre- 
specified symmetric positive definite matrix. To provide the prior specification for 

the joint distribution of the spatial regression parameters we set F^-^^'^ = F^'*^*) 
and, following Sain and Cressie (2007), we use the re-parametrization F^^^^i) = 
T^ihxi)-'^^^ ■p{hTi)r£{h^.)^^^ and specify its prior to be proportional to exp {-v'v/<^'^] , 

where v = vec^F^'^^'i^'^ . The prior parameter ? is specified by choosing small val- 
ues, since the prior for F^^^^^i) is concentrated around zero. Then, in both mean and 
variance of the GMRF processes we adopt priors centered around pre-fixed values, 
as defined in section 3. 
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State equation 

When stationarity conditions are met for the latent processes the prior distribu- 
tions for the state equation coefficients can be specified as proposed in Lopes et al. 
(2008). For the cointegration case, since the formulation given in equation (11) is 
quite general, and many plausible restricted models can be envisaged. Stochastic 
Search Variable Selection (SSVS) priors (see Jochmann et al., 2011) are used for 
the parameters of the state equations. Note that these plausible models may differ in 
the choice of the restrictions on the cointegration space, the number of exogenous 
and endogenous latent variables, and the lag length allowed for the autoregression. 

The en^or covariance matrices are assumed to be decomposed as Xl^^ = V^V^ 
and = V^V^, where and are upper-triangular matrices. Then, the 
SSVS priors involve using a standard Gamma prior for the square of each of the 
diagonal elements of V(.) and the SSVS mixture of normals prior for each element 
above the diagonal (George, Sun and Ni, 2008). Note that if the error covariance 
matrices are chosen to be diagonal, then the computation of the posterior simplifies 
considerably. 

Since A is potentially of reduced rank and cnicial issues of identification may 
arise in the ECM form, linear identifying restrictions are usually imposed. How- 
ever, because of local identifi ability problems and the restriction on the estimable 
region of the cointegrating space (Koop et al. 2006), the so-called linear normaliza- 
tion approach also suffers from several drawbacks. To overcome these problems, 
we thus adopt the SSVS approach proposed by Jochmann et al. (2011) which, 
defining priors on the cointegration space, is facilitated by the computation of 
Gaussian posterior conditional distributions (Koop, Leon-Gonzalez and Strachan, 
2009). A brief summary of the SSVS priors used in this paper is provided in Ap- 
pendix B. For a more complete description, the reader is referred to Jochmann et 
al. (2011) and Koop et al. (2009). 

Finally, the prior for the latent process a{t) is provided by the transition equa- 
tion and is completed by a(0) A^(ao, ^ao), for known hyperparameters ao and 
SqO (Durbin and Koopman, 2001; Rosenberg, 1973). 

6.2. The likelihood function. To specify the likelihood function, without loss 
of generality, it will be assumed that my{t) = and mx{t) = 0. Conditional on 
Q;(t), for t = 1, . . . , T, the SD-SEM model can be rewritten as Z = qH' + U, 
where Z = [z(l), . . . , z(T)]' and a = [q:(1), . . . , a.{T)]' . The error matrix, U, is 
of dimension (T x n), where n = hx + fiy, and follows a matrix-variate normal 
distribution, i.e. U ^ N{0, It, S^) - see Dawid (1981) and Brown, Vannucci and 
Feam (1998). Then the deviance, minus twice the log-likelihood is 
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T){z\@,'Eu,U,a,m,l) = Tn log(27r)+T log |S„j+trace {l]~^(Z - Q!H')'(Z - 
where is the full set of model parameters. 

6.3. Posterior inference. Posterior inference for the proposed class of spatial 
dynamic factor models is facilitated by MCMC algorithms. Standai^d MCMC for 
dynamic linear models are adapted to our model specification such that, conditional 
on / and m, posterior and predictive analysis are readily available. In the following, 
we provide some information on the relevant conditional distributions. By denot- 
ing with the suffix for the unobsei^ved data, posterior inference is based on 
summarizing the joint posterior distribution p(Z", 0, q;(0), q;|Z). 

The common factors are jointly sampled by means of the well known forward 
filtering backward sampling (FFBS) algorithm (Carter and Kohn, 1994; Friihwirth- 
Schnatter, 1994). All other full conditional distributions are "standard" multivariate 
Gaussian or Gamma distributions. An exception is for the spatial parameter matri- 
ces, FC^fi) and F('^^i), and the covariance matrices, T^'^y,)'^ and T^^^i)" , which 
are sampled using a MetropoUs-Hastings step. Specific details for the implementa- 
tion of the full conditional distributions can be found in Lopes et al. (2008), Sain 
and Cressie (2007), and Jochmann et al. (201 1). 

6.4. Model identification. Some restrictions on and H^; are needed to de- 
fine a unique model free from identification problems. Several possibilities can be 
considered and the solution adopted here is to constrain the measurement matrices 
so that they are lower triangular, assumed to be of full rank. We note here that we 
have proper but quantitatively vague priors which can lead to posteriors that are 
computationally indistinguishable from improper ones with the consequence of an 
MCMC convergence failure. Hence, to avoid relying so strongly on the prior spec- 
ification, we prefer to focus on models which are identified in a frequentist sense. 
The approach is fully discussed in Ippoliti et al. (2012) and Strickland et al. (201 1). 

A critical comment to be borne in mind is that the chosen order of the univari- 
ate time series in the measurement vector influences interpretation of the factors 
and may impact on model fitting and assessment, the interpretation of factors if 
such is desired, and the choice of the number of factors. In such cases, the order- 
ing becomes a modeling decision to be made on substantive grounds, rather than 
an empirical matter to be addressed on the basis of model fit. However, from the 
viewpoint of forecasting the ordering is irrelevant. For a detailed discussion on 
these points see, for example. Lopes and West (2004). 

6.5. Model selection. With this class of model, an important issue is the selec- 
tion of m and I. Several Bayesian selection methods have been developed and for 
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a discussion, see for example section 4.1 in Lopes et al. (2008). Here, we consider 
a simple approach which only considers the variable of interest, Y, and that con- 
sists in the minimization of the following predictive model choice statistic (PMCC, 
Gelfand and Ghosh, 1998) 

PMCC = -^G + P, 

C + 1 

where, for our proposed model, G = X^jf(Y(sj,t) — E[Y{si,t)rep])'^ and P = 

J2^,t Var[Y{Si,t)rep]. 

This statistic is based on replicates, Y{s, t)rep, of the observed data and the 
summation is taken over i = 1, ■ ■ ■ , N, and t = 1, - ■ ■ ,T. Essentially, the PMCC 
quantifies the fit of the model by comparing features of the posterior predictive 
distribution, p(Y(s, t)rep\Y{s, t)), to equivalent features of the observed data. The 
quantity G is a measure of goodness of fit while P is a penalty term. As the mod- 
els become increasingly complex the goodness of fit term will decrease but the 
penalty term will begin to increase. Overfitting of model results in large predic- 
tive variances and large values of the penalty function. The choice of C determines 
how much weight is placed on the goodness of fit term relative to the penalty term. 
As ( goes to infinity, equal weight is placed on these two terms. Banerjee, Carlin 
and Gelfand (2004) mention that ordering of models is typically insensitive to the 
choice of (, therefore we fix ^ = cxo. Notice that at each iteration of the MCMC we 
can obtain replicates of the observations given the sampled values of the par ame- 
ters. 

7. Uses of the model. In this section we provide specific details on how to 
obtain temporal forecasts of the variable of interest Y. 

7.1. Unconditional forecasting. Temporal forecasts of the variable Y are di- 
rectly obtained through the state space formulation of the model. In fact, it is easy 
to show that since, a{t)\a{t — l) ~ N{^cx{t — 1), So), the /c— step ahead forecast 
for the dynamic factors is given by p{a{t + k)\&) ~ N{^^^^cx{t), fiW), where 
fiC^) = Y^j^^ $('=~-j)Sa*^''~-'^'. Therefore, the A;-step ahead predictive density, 
p {z{t + k)\Z), of the joint process Z = [y X] is given by 

p{z{t + k)\Z) = j p{z{t + k)\cx{t + k),U,@)p{cx{t + k)\cx{t),U,Q) 
p («(*), H, 0|Z) da{t + k) dcx{T) dU d@. 

Draws from p {z{t + k)\Z) can be obtained in thrxe steps. Firstly, is sampled 
from its joint posterior distribution via MCMC. Secondly, conditionally on the 
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common factors oc{t + k) are independent of Z and can be sampled from p{a{t + 
A;)|0). Thirdly, z(t + k) is sampled fromp(z(t + k)\a{t + k), H, 0). 

7.2. Conditional forecasting. The forecasting procedure described above is 
obtained under the hypothesis that the predictor X is unknown for the period of 
interest. However, quite flexible forecasts can also be obtained conditional on the 
potential future paths of specified variables in the model. In fact, it may happen 
that some of the future values of certain variables are known, because data on these 
variables are released earlier than data on the other variables. By incorporating the 
knowledge of the future path of the X variable, in principle it should be possible 
to obtain more reliable forecasts of Y. 

Another use of conditional forecasting is the generation of forecasts conditional 
on different policy/exploratory scenarios. These scenario-based conditional fore- 
casts allow one to answer the question: if something happens to X in the future, 
how will it affect forecasts of Y in the future? Hence, a plurality of plausible 
alternative futures for X can be considered and temporal forecasts of g(t) can 
be produced conditional on a specific path of f(i). Under these assumptions, in 
the following, we propose a simple procedure to obtain g(T + k) given f(r + 
1), . . . , f (T + A;), and all present and past infoiTnation, thus avoiding the use of 
equation (4) to obtain fc-step ahead forecasts of f (t). 

Suppose that for the period, T + 1, T + 2, . . . , T + A;, X is known (or fixed a pri- 
ori) and that = [x(r + 1), x(T + 2), x(r + k)]. Then, fc-step ahead forecasts 
of g(t) may be obtained conditional on = [f (T + 1), f (T + 2), . . . , f (T + k)], 
where f^^^ = hIx^ and hJ. is the Moore-Penrose pseudo-inverse of H^;. 

Finally, note that although spatial interpolation is not a main task in lattice data 
appUcations, the reconstruction of missing data (i.e. partial or total reconstruction 
of the multivariate time series of one - or more - State) is an important issue in 
general. This can be simply done by exploiting the conditional expectation of the 
GMRF and following section 6.2 in Ippoliti et al. (2012). 

8. Spatio-temporal analysis of US housing prices. Public policy interven- 
tions in housing markets are widespread and a key question is the extent to which 
these policies achieve their desired objectives and whether there are any unin- 
tended consequences. Especially for its relationship with mortgage behavior, in 
recent years, real housing prices have been of great concern for many financial in- 
stitutions. Understanding the impact of specific factors on real housing prices is 
thus of great interest for governments, real estate developers and investors. In this 
paper, we examine if the total personal income (TPI) and the unemployment rate 
(UR) have some impact on the housing price index (HPI). The data, introduced 
in section 1.1, consist of quarterly time series on 48 States (excluding Alaska and 



MODELING US HOUSING PRICES BY SD-SEM 



17 



Hawaii) from 1984 (first quarter) to 2011 (fourth quarter). However, in this study, 
the last 10 quarters have been excluded from the estimation procedure and used 
only for forecast purposes. 

In order to consider per capita personal income (PCI), the annual population 
series (U.S. Census Bureau) is converted into a quarterly series through geometric 
interpolation. Moreover, we consider real per capita personal income (RPCI) and 
housing price index (RHPI) dividing PCI and HPI by State level general price in- 
dex. However, since there is no US State level consumer price index (CPI), follow- 
ing Holly et al. (2010), we have constructed a State level general price index based 
on the CPIs of the cities or areas. All the variables are analyzed on a logarithmic 
scale. Henceforth, the variables are denoted as y = log (RHPI), Xi = log (RPCI) 
andX2 = log(UR). 

Model Specification: measurement equations 

To provide a full specification of the inverse covariance matrix of each factor load- 
ing, we make use of a contiguity or adjacency matrix W. We assume here that W 
has zero diagonal elements and non-negative off-diagonal elements which reflect 
the dependency between States Sj and sj - i.e. the neighborhood set Si. Hence, to 
postulate plausible relationships between two States, as in Holly et al. (2010), we 
assume that W is a binary proximity matrix which assigns uniform weights to all 
neighbors of State Sj, that is 




1 if States s/ and sj share a common border, 
otherwise. 



Then, since the general model described in section 3 is overparameterized, it 
is necessary to impose some parameter restrictions. For example, because Y is 
univariate (i.e. Uy = 1), each column of Hy (i.e. hy^) is treated as a univariate 
GMRF with conditional mean, 

and conditional variance 

VAR[/ij,^.(si)|R-i] =^('^«^-)'. 
On the other hand, since X is a bivariate process - i.e. = 2 and X(s, t) = 

[Xi(s, t),X2(s,t)]' - we assume that for z,u = 1, . . . ,n, ""^^ = T'-'^^'j-* is a 
(2 X 2) conditional covariance matrix and 
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qO) nU) 
q(i) nU) 

'X2,Xl "X2 



Hence, the covariance matrix can be written as 



) 



where W*^ and denote the upper and lower triangular parts of W, respec- 
tively. Conditions for which Yi'^^^'i^ is positive definite depend on the parame- 
ter space of the spatial interaction parameters in F^'^^'j^. However, by restricting 
"^3' to be strictly diagonally dominant or adding a penalty if some of the 
eigenvalues are negative, will ensure positive definitiveness (for a discussion on 
this point see Sain and Cressie, 2007). 

Since interpreting the spatial parameters in f'^'^^'j^ requires some care, more infor- 
mation on the impact of the choice of F^'^'^j^ can be obtained by examining the 
conditional covaiiance of two neighboring locations (given the rest) 



iu\—iu 



-1 



or, analogously, the conditional correlation matrix 



(12) n,,i_,,=A-^s;;^^„A"^ 

where A = diag f S . 7 ) • 

The parameters for the priors on l3^^'^^\ r^^^'^i^ ^^^^ F^^"^'^ are set as follows: 
(t| = 100, Qx = 20, = I and ? = 0.05. The design matrix !>* is specified 
to represent a constant mean in space and we also consider m.y(t) = and 
nix{t) = nix. 

Model Specification: state equation 

Motivated by the debate on the possible existence of cointegration between RHPI, 
RPCI and UR we consider the cointegrated model specification as shown in section 
5.1. The temporal lag of the state equations has been fixed to 2 (i.e. p* = 2), and 
an increasing number of common factors, i.e. 2 < m, Z < 12, has been considered 
for the model specification. Then the maximum possible number of cointegrating 
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relationships are defined as = m — 1 and rj = l — l. Other modeling details, 
including prior hyperparameter values, are defined in section 6 and Appendix B. 

Together with the model specification described above, hereafter denoted as Mq, 
other simpler models representing a simplification of Mq were also considered for 
comparison purposes. Specifically, to have an idea of the relative importance of the 
different specifications used in Mq (e.g. coixelated factor loadings and cointegrated 
factors), three models with the following assumptions were considered: i) uncor- 
related factor loadings and a simple VAR specification (i.e. without cointegration) 
for the state equation (Mi), ii) uncon^elated factor loadings and cointegrated factors 
(M2), iii) correlated factor loadings and a simple VAR specification (i.e. without 
cointegration) for the factors (M3). Finally, a fourth model (M4) which is relatively 
simple to estimate (see, for example, Lutkepohl, 2005) but with a completely dif- 
ferent structure is also considered 

Y{si,t) = c{si,t)'P{si) + Uy{si,t), 

where c(sj, t) is the vector containing the covariates Xi and X2 (including the in- 
tercept), (3{si) is the corresponding vector of (site-specific) regression coefficients 
and Uy{si, t) is a VAR(2) process where the noise part of the model is assumed to 
be distributed as a univariate GMRF (i.e. the noise is uncorrelated in time but it is 
allowed to be spatially coiTclated). The introduction of a spatial (GMRF) prior on 
the regression coefficients is also considered in the parametrization. 

Model Estimation 

The identifi ability constraints associated with the model to be estimated concern 
the ordering of the States and the connection between the chosen ordering and the 
specific foiTn of the factor loading matrices H.y and H^. Unfortunately, no fixed 
rules exist to select the States which must be constrained. In the following, we thus 
discuss a possible strategy which exploits results from a cluster analysis performed 
(before estimating the model) on the data matrices Y and X, respectively, of di- 
mensions {hy X T) and {hx x T). In this case, considering RHPI, the K-Means 
classification algorithm is repetitively run for a number of clusters equal to m, with 
2 < m < 12. The States (one for each cluster) to be constrained are thus chosen as 
the ones that: (possibly) belong to different BEA regions, show the highest mean 
values of RHPI and/or are far apart from each other (especially when m is larger 
than the number of BEA regions). For a given such that 2 < / < 12, the same 
procedure is also applied to X and, whenever possible, the same States selected for 
the housing prices are chosen. Note that especially in cases in which / > m, the 
choice of the States within the clusters obtained for X can be made independently 
of RHPI and based on several criteria such as the membership to different BEA re- 
gions and/or highest (smallest) mean values of RPCI (UR). When m > I, the same 
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criteria can be adopted to choose the States among the ones already constrained in 

For each fitted model, the MCMC algorithm was run for 250, 000 iterations. 
Posterior inference was based on the last 150, 000 draws using every 10th member 
of the chain to avoid autocorrelation within the sampled values. Several MCMC 
diagnostics could be used to test the convergence of the chains (see, for exam- 
ple, Geweke, 1992; Gilks, Richardson and Spiegelhalter, 1996; Spiegelhalter et al. 
2002, and Jones et al., 2006). In our case, convergence of the chains of the model 
was monitored visually through trace plots as well as using the i?-statistic of Gel- 
man (1996) on four chains starting from very different values. 

Competing models were compared using the predictive model choice statistic, 
PMCC, described in section 6.5. The PMCC criterion suggests that, for Mq, the 
optimal choice is found with m = 7 and / = 8. The same number of components 
is also confirmed for models M1-M3. However, compared with M3, the best of the 
three alternative models, the PMCC increases of 17%, which denotes much worse 
model fitting properties. 

Notice that for Mq, the following States have been constrained in the factor 
loading matrix H^: North Carolina, Montana, California, Massachusetts, Texas, 
Illinois and Arizona. Instead, considering H^^, we have constrained 5 States for 
UR: North Carolina, California, Massachusetts, Texas and Illinois, and 3 States for 
RPCI: Arizona, Montana and Massachusetts. 

Factor loadings and common latent factors 

The MCMC estimates of the endogenous components, gi{t), i = 1, . . . , 7, appear 
as non-stationary processes, each representing specific features of the large-scale 
temporal variability of the RHPI series. The first two latent components represent 
common trends and are characterized by naixow 95% credibility intervals. Specifi- 
cally, the pattern of the first component, shown in Figure 2(a), highlights a growth 
of RHPI since the early nineties up to 2006 followed by a sustained decrease. At 
the national level, prices increased substantially from 2000 to the peak in 2006 
and then have been falling very sharply across the country. An exploratory analysis 
shows that this component tracks the pattern of the national RHPI, although the 
latter seems to be a bit more volatile, especially in the period 1984-1994. We also 
notice that this component is highly correlated (i.e. the correlation is in general 
greater than 0.80) with all the State time series with the exception of Connecticut, 
Texas and Oklahoma, for which the conelation is around 0.50. 

The series of the second component, g2{t), shown in Figure 2(b), is character- 
ized by a price trough in the mid-1980s and mid-1990s followed by a mild price 
peak. Then, the late 1990s begin with a dramatic and sustained increase. Examina- 
tion of the data plotted in Figure 1 shows that this is a typical pattern of the 50% of 
the States of Plains, Southeast Southeast and Rocky Mountain. 
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The remaining latent variables (not shown here) present some peculiarities for 
the periods 1984-1990 and 2004-2007 and, compared with the first two factors, are 
characterized by slightly wider credibility intervals. 




(c) hyi - RHPI - (d) - RHPI - 



Fig 2. Subplots (a) and (b): marginal posterior medians for the estimated latent factors gi{t) and 
32 (i) (continuous line) and their 95% credible interx'als (dashed line). Subplots (c) and (d): maps of 
the posterior medians for the factor loadings hy-^ and hy^ related to the real housing price index. 

Figures 2(c-d) show the maps of the estimated first two factor loadings - i.e. 
the first two columns of the measurement matrix H^. The maps cleaiiy show the 
presence of clusters of US States. Table 2 also shows the posterior summaries of the 
between-location conditional correlations estimated (using equation 12) for each 
column of H^. Since the 95% credibility intervals do not overlap zero and all the 
conditional correlations seem to be statistically significant, the clusters are easily 
identified by looking at the spatial patterns of the factor loadings. 
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Table 2 

Posterior summary of the between-location conditional correlations for the columns of 
the measurement matrix Hy. In brackets we show the 2.5 and 97.5 percentiles used for 



defining the 95% credible interval limits 




Factor loadings (Hy) 
1 2 3 4 5 6 7 


median 
95% CI 


0.09 0.08 0.08 0.07 0.06 0.08 0.08 
[0.05,0.12] [0.04,0.12] [0.02,0.12] [0.03,0.10] [0.03,0.12] [0.02,0.12] [0.02,0.12] 



Figure 2(c) shows (using the natural break method of ArcMap, ESRI, 2009) 
the weights of the first factor loading, h.y^ . Except for Texas, Oklahoma and North 
Dakota, these weights are all positive with the highest loadings observed in the 
Pacific and North East regions, which strongly influence the contiguous regions. 

Figure 2(d) also shows an interesting pattern in the loadings. Southwest, Rocky 
Mountain States, some Plains States and Louisiana have positive loadings, while 
the other States have negative loadings. The States with highest loadings (Louisiana, 
New Mexico, Texas, Oklahoma, North Dakota and Wyoming) show a temporal pat- 
tern very similar to the second latent variables. On the other hand, the States with 
lowest values (California, Connecticut, Michigan, New Jersey and Rhode Island) 
show temporal dynamics which, at least until the end of the nineties, result as the 
opposite of 52 (0- Many of these States in the last 25 years have been particular ben- 
eficiaries of new technologies. These innovations interacting with restrictions on 
new residential buildings have resulted in real housing prices in these regions devi- 
ating from the average across US States over a relatively prolonged period (Holly 
et al., 2010). Also, considering the period 1984-1990, the spatial contrast high- 
lighted in the map of Figure 2(d) clearly confirms that while West-South-Central 
regions (especially "oil-patch" states such as Texas and Oklahoma) experienced 
sharp declines, the North East and California housing market were booming. Note 
that this map provides clear evidence of the results described in Table 1 where we 
have found significant correlations between the States belonging to the East and 
West regions. 

The MCMC estimates of the exogenous components, (t) , z = 1 , . . . , 8, sum- 
marize the dynamics of RPCI and UR variables. The first three of these latent 
factors, together with their 95% credibility intervals, ai^e shown in Figure 3. These 
components seem to have a substantial impact on RPCI and UR, although the latter 
shows more complex dynamics which can be fully understood by examining the 
behavior of all the estimated factors. 

The first factor, fi{t), shows a cyclical behaviour with a slightly positive trend 
in the period 1986-2000. The series exhibits a trough in the period 2000-2006 fol- 
lowed by a sustained decrease. The 2000-2006 pattern has roots in the prior turmoil 
in financial markets. In fact, the period 2000-2001 is characterized by a rapid de- 
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cline of high tech industries, a collapse of the stock market and a slow level of 
technology investment. The relaxed monetary policy adopted by the Federal Re- 
serve had thus lead to an increase of RPCI and a decrease of UR up to 2007. 

The factor loadings related to fi{t), shown in Figure 3(b) and Figure 3(c), are 
all positive for RPCI and negative for UR. Figure 3(b) clearly shows groups of 
States with common spatial patterns. Specifically, we notice the presence of two 
clusters: the first involves several States from Great Lakes, Southeast and New 
England while the second is mainly characterized by Oregon and some States of the 
Mountain region (Arizona, Utah, Nevada and Wyoming). Also, the highest values 
are related to those States (Colorado, Connecticut, Georgia, Massachusetts, New 
Jersey, North Carolina and Texas) whose RPCI shows the same cyclical pattern of 
fi{t) in the period 1995-2009. 

Figure 3(c), related to UR, shows a quite big cluster of States forming a ridge 
from Montana to Mississippi. For these States the variations of UR are less pro- 
nounced with respect to those showing the smallest loadings (e.g. Alabama, Col- 
orado, Indiana and Virginia). 

The dynamics of RPCI and UR in the first period of the series is captured by 
the third latent factor fsit) shown in Figure 3(g). The figure shows that the early 
nineties are characterized by a trough of UR and a hill for the RPCI. 

Figure 3(h) shows a huge cluster with values of the loadings in the range 1.10 — 
1.64; the highest values ai^e observed in the Southeast region for which the oscilla- 
tions of RPCI are bit more pronounced than other States. 

Figure 3(i) shows that the States for which the trough of UR is more pronounced 
are characterized by lowest values of the loadings. Notice that this figure also shows 
a reasonable coiTcspondence with Figure 2(d). 

The second factor, f2{t), shows a decreasing trend associated with negative val- 
ues of h.x2 - RPCI - and (mainly) positive values of h^^ - UR. The maps of the 
factor loading clearly provide information on those States which have experienced 
a positive trend for RPCI (e.g. Alabama, Arkansas, Mississippi, Nebraska, South 
Dakota, Tennessee and Wyoming) as well as a downward trend for UR (see, for 
example, Alabama, Iowa, Louisiana, Pennsylvania and West Virginia). 

The spatial structure of the factor loadings is also confirmed by the the poste- 
rior summaries of their within- and between-location conditional correlations and 
cross correlations (see Table 3). The 95% credibility intervals suggest that the most 
part of these congelations can be considered as non-zero. Also, the conditional spa- 
tial dependence of each factor loading is positive while both the between- and the 
within-location conditional cross-correlations are negative. 

Model Estimation: cointegration 

As noted in the introduction there has been a quite long debate in the literature 
about whether there is cointegration between real housing prices and fundamentals. 
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(d) fa (e) h,, - RPCI - (f) h,, - UR - 



02- 




(g) fs (h) - RPCI - (i) - UR - 

Fig 3. Subplots (a), (d) and (g): marginal posterior medians for the estimated latent factors fi{t), 
f2(t) and fz{t) (continuous line) and their 95% credible inter\'als (dashed line). Subplots (b), (e) 
and (h): maps of the posterior medians for the factor loadings hx^, \\x2 ond h^^ related to the real 
per capita personal income variable. Subplots (c), (f) and(i): maps of the posterior medians for the 
factor loadings h^j^, h^j and related to the unemployment rate variable. 
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Table 3 

Posterior summary of the within- and between-location conditional correlations and cross 
correlations for the first three factor loadings columns related to the unemployment rate and real 
per capita personal income variables. In brackets we show the 2.5 and 97.5 percentiles used for 
defining the 95% credible interval limits 

Conditional congelation 
Within-location Between-location Between-location Between-location Between-location 





- RPCI vs UR - 


- RPCI- 


- RPCI vs UR - 


- UR vs RPCI - 


-UR- 




-0.22 


0.06 


-0.04 


-0.03 


0.05 




[-0.44, -0.09] 


[0.01,0.09] 


[-0.07, -0.02] 


[-0.07,-0.01] 


[0.03, 0.08] 




0.02 


0.05 


-0.00 


-0.01 


0.07 




[-0.29, 0.12] 


[0.01,0.10] 


[-0.06, 0.05] 


[-0.06, 0.06] 


[0.02, 0.09] 




-0.27 


0.08 


-0.02 


-0.04 


0.07 




[-0.38, -0.02] 


[0.02, 0.12] 


[-0.07, 0.03] 


[-0.07,-0.01] 


[0.03,0.10] 



The idea is that in the absence of cointegration there is no fundamentals driving real 
housing prices and the absence of an equilibrium relationship would essentially 
increase the presence of bubbles (Case and Shiller, 2003, Holly et al., 2010). Here, 
we test the existence of this cointegrating relationship in a latent space avoiding to 
take account of the effect of the cross-sectional dependence (see Holly et al., 2010 
for a discussion on this point). In terms of cointegrated ranks, following Jochmann 
et al. (2011), our posteriors for rj, r^, and r^.^ are obtained by considering 

the draws of their respective matrices (i.e. Ilj, n^d, ABg + A2By, ABg and 
A2By, see Appendix A), and taking the number of singular values greater than 
0.05. 

These are shown in Table 4 where we note that there is a strong support for an 
exogenous cointegrated rank of either 4 or 5; for there is a hint of a rank equal 
to 5, but small probabilities are also observed for 4 and 6. Finally, since there is 
evidence that Tc < + rcj, we may conclude that a cointegration structure is 
confirmed between the endogenous and exogenous processes. Such a result thus 
supports the idea about the existence of a convergence to a stable equilibrium rela- 
tionship and, hence, about the absence of a US housing price bubble for the period 
considered in the study. 



Table 4 

Posterior of cointegration ranks vf, rd, Tc, and 





Estimated probabilities for effective ranks 




1 2 3 4 5 


6 




0.00 0.00 0.01 0.34 0.61 


0.04 


Td 


0.00 0.00 0.00 0.18 0.70 


0.12 


Tc 


0.00 0.00 0.00 0.02 0.48 


0.50 


Tci 


0.00 0.00 0.10 0.63 0.27 


0.00 


rc2 


0.00 0.00 0.03 0.45 0.50 


0.02 
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To provide further evidence that our approach is yielding sensible results, the use 
of Bayes factors using a non-SSVS prior (Sugita, 2009; Kass and Raftery, 1995) 
confirms that, conditionally on m = 7 and / = 8, results for r/ and are similar 
to those presented here. 

Unconditional and conditional forecasts 

To test the predictive performance of the SD-SEM model, the last 10 quarters have 
been excluded from the estimation procedure and used only for forecast purposes. 
Hence, we consider the forecast for a horizon of k = 10 periods corresponding 
to the quarters Q3-2009 — Q4-2011. Also, predictions of RHPI are obtained by 
following two settings: 

i) unconditional predictions: we only use past information; hence, X is not 
available for the forecast period. 

ii) conditional predictions : the exogenous variables Xi and X2 are assumed 
known in the period in which temporal forecasts of RHPI are required; 

For each State, both unconditional and conditional forecasts (together with 95% 
credible intei^vals) of the housing price index are shown in Figure 4. In general, 
compared with true values, good prediction results can be achieved and as ex- 
pected, the conditional (on the known values of X) approach exhibits more en- 
couraging out-of-sample properties of the model, with data points being more ac- 
curately predicted. 

To provide some measures of goodness of prediction for the estimated mod- 
els. Table 5 gives details on the root mean squared prediction error, RMSE = 

^mean ^(Y{s,t) — E[Y{s,t)rep]y^, the mean absolute error deviation, MAE = 

mean 1 1^ (s, t) — E[Y{s, t)rep] \ | (where Y is the variable at the original scale and 
the mean is taken over the (iV x k) observations), the coverage probabilities (CP) 
and the average width (AIW) of the prediction intervals. We note that in the con- 
ditional case model Mq shows much smaller values for RMSE, MAE and AIW; 
on the other hand, the coverage probabilities of the 95% intei-vals are larger than 
the nominal rate. Models Mi, M2 and M3 provide very similar results and pro- 
vide some hints on the role played by the spatially autocorrelated factor loadings 
and cointegrated factors. In general, model Mq works better than Mi, M2 and M3 
for which the average width of the prediction intervals are wider. We note that in- 
troducing the spatial coixelation the AIW reduces substantially. The same effect, 
albeit with different intensity, can be observed assuming cointegrated factors and 
this can be detected by contrasting models M0-M3 and M1-M2. 

By making the series stationary through a first difference transformation, the 
best result of model M4 is chai^acterized by an RMSE of 13.575 and a MAE of 
11.372. This result is obtained by using a GMRF prior on the regression coeffi- 
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cients. We also note that for this model the regressors, Xi and X2, are assumed as 
known for the forecast period. Producing unconditional predictions under model 
M4, in fact, is not straightforward since it requires further adjustments for predict- 
ing the process X. 

Table 5 

Root mean squared prediction errors (RMSE), mean absolute deviations (MAE), coverage 
probabilities ( CP) and average width (AIW) of the prediction intervals, for unconditional 
and conditional forecasts ofRHPI. The statistics are computed for the estimated models 

Mq, Ml, M2, M3 and M4. 



Model 


Type of Prediction 


RMSE 


MAE 


CP 95% interval 


AIW 95% interval 


Mo 


Unconditional 


16.081 


11.704 


0.958 


59.762 




Conditional 


7.223 


5.558 


0.989 


54.723 


Ml 


Unconditional 


17.294 


12.950 


1.000 


140.052 




Conditional 


9.497 


6.614 


1.000 


138.140 


Ma 


Unconditional 


17.496 


12.942 


0.989 


112.814 




Conditional 


9.904 


7.414 


0.998 


112.086 


M3 


Unconditional 


17.150 


12.759 


0.969 


77.042 




Conditional 


9.331 


6.695 


0.985 


76.445 


M4 


Unconditional 












Conditional 


13.575 


11.372 


0.920 


53.180 



MULTIPLIER Analysis 

We conclude the analysis by providing some results from multiplier analysis (Lutke- 
pohl, 2005) which is helpful to describe how the housing price index reacts over 
time to exogenous impulses. In this case, we can check if past values on either 
RPCI or UR, observed on a specific State, contain useful information to predict the 
variation of RHPI, in addition to the information on its past values. It can be shown 
(see Appendix C) that the dynamic multipliers, F^, which reflect the marginal im- 
pacts of changes in the predictors Xi and X2, are defined as 

Tfc = Hj,5Q^BHt., fc = 0,l,... 

where, at the fc-th period (quarter), the ^ element of the [N x hx) matrix 
represents the response of the housing price in the i-th State to a given shock in the 
predictor X^, / = 1, 2, in State j, provided the effect is not contaminated by other 
shocks to the system. The matrices 3, Q and B, which contribute to determine the 
multipliers, are defined in Appendix C. 

The impulse responses ofRHPI to a 1% shock in the exogenous variables, RPCI 
and UR, in each State, show some interesting features. However, since many possi- 
ble interactions among States and variables can be envisaged, in the following we 
provide a summary of the results as well as a visual impression of some of the dy- 
namic interrelationships existing in the system. Note that following Sims and Zha 



28 



P. VALENTINI, L. IPPOLITI, L. FONTANELLA 



(1999) and Primiceri (2005), the credibility intervals of the impulse response co- 
efficients are discussed at the 16-th and 84-th percentiles which, under normality, 
con^espond to the bounds of a one-standard-deviation. 

One interesting feature is that a shock in RPCI in the States belonging to New 
England (with the exception of Connecticut and New Jersey) does not seem to 
produce evident effects on RHPI. The same holds for a RPCI shock in Mideast 
States whose effects seem to disappear after one quarter. It thus seems that past 
values of RPCI, in these regions, do not help in forecasting RHPI throughout US. 
At the same time, apart from New Hampshire and Maryland, the prices in New 
England and Mideast do not seem to react to a RPCI shock in any other region. 
The housing prices in Michigan, Ohio and Illinois, belonging to the Great Lakes, 
also seem to behave similarly. Note that this similarity in behavior was also found 
by Apergis and Payne (2012) in a study on housing price convergence. 

On the other hand, there is a stronger evidence of the relationships between UR 
shock effects in the States of New England and Mideast and RHPI responses in 
several States, mainly belonging to Southeast, Plains and Southwest regions. Also, 
RHPI forecasts in New England and Mideast regions can be improved by exploit- 
ing UR information on other States. In any case, considering the infra-regional 
responses (i.e. RHPI responses of New England and Mideast States to a UR shock 
produced in any State belonging to the same region), we note that UR effects on 
the variation of RHPI disappear after one period. 

Regarding the remaining BEA regions, a 1% shock to either RPCI or UR seem 
to highlight effects on the housing prices involving a quite large network of States, 
particularly in the second quarter. Analyzing the impulse responses for longer peri- 
ods, we note that the network of relevant relationships between the States becomes 
sparser. However, the most persistent effects on RHPI, which also involve a large 
numbers of States belonging to Southeast, Plains, Rocky Mountain, Southwest and 
Far West regions, are associated to RPCI shocks in Nevada, Arizona, Georgia, 
Alabama and Mississippi, and to UR shocks in Illinois, South Carolina, Florida, 
Alabama, Iowa, South Dakota and Nebraska. 

Moreover, the States whose RHPI responses are more persistent to RPCI shocks 
in any other State of the aforementioned regions are Florida and Nevada, while 
the States whose responses are more persistent to UR shocks are New Mexico, 
Arizona, Arkansas and Mississippi. 

If we consider the sign of the impulse response coefficients we note that, in gen- 
eral, a positive shock to RPCI is associated to a positive effect on RHPI. Some 
exceptions are observed in the first period where we can find negative coefficients. 
On the other hand, the scenario appears to be different for the UR case, in which we 
note both positive and negative effects on RHPI even for longer periods. Although 
we may expect that unemployment has an adverse effect on real estate prices, pre- 
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vious studies have nevertheless found unemployment to be positively related to 
housing prices. For a discussion on this point we refer the reader, for example, 
to Vermeulen and Van Ommeren (2009), Clayton, Miller and Peng (2010), and 
Moench and Ng (2011). 

Finally, to provide a flavor of the type of relationships. Figure 5 shows poste- 
rior mean housing price responses (solid line) in Nevada, Oregon, Arizona, New 
Mexico, Utah, Idaho and California to a 1% shock to RPCI in Nevada. Figure 6, in- 
stead, shows the responses in Florida, Tennessee, Alabama, Mississippi, Arkansas, 
West Virginia, North Carolina and Georgia to a 1% shock to UR in Florida. The 
shaded regions indicate the credibility intervals corresponding to 68 and 90 per- 
cent. Overall, the plots suggest that State-level responses follow a similar pattern 
(consistently with the ripple effect) and, in most cases, the effects tend to decay 
over two years, especially for UR shocks. 

9. Discussion. In this paper we have discussed the modeling of spatio-temporal 
multivariate processes observed on a lattice by means of a Bayesian spatial dy- 
namic structural equation model. We have used ideas from factor analysis to frame 
and exploit both the spatial and the temporal structure of the observed processes. 

It can be shown that the SD-SEM encompasses a large class of spatial-temporal 
models that are commonly used and, more importantly, differs from them in two 
major aspects: i) it avoids the curse of dimensionality commonly present in large 
spatio-temporal data and ii) it facilitates the formation of spatial clusters which 
further avoids dimensionality issues. 

The model has been implemented in a Bayesian set-up using MCMC sampling. 
The MCMC chains of the parameters were monitored to detect possible problems 
in convergence although no such problems were found in the implementation. 

The model was applied to study the impact that the real per capita personal in- 
come and the unemployment rate may have on the real housing prices in the USA 
using State level data. Forecasting the future economic conditions and understand- 
ing the relations between the observed variables have been two important aspects 
covered by our model. The spatial variation is brought into the model through the 
columns of the factor loading matrix and the estimated conditional coixelations and 
cross-correlations gave significant evidence of spatial dependence associated with 
contiguity. The spatial patterns of the factor loadings revealed several clusters of 
interest showing common dynamics. 

The time series dynamics have been captured by common dynamic factors. An 
error collection model specification, with a cointegrating relationship between the 
common latent factors, was found useful once we took proper account of both het- 
erogeneity and cross-sectional dependence. Overall, results support the hypothesis 
that real housing prices have been rising in line with fundamentals (real incomes 
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and unemployment rates), and there seems no evidence of housing price bubbles at 
the national level. 

Results from multiplier analysis were also helpful to describe how the housing 
price index reacts over time to exogenous impulses. We have found that, consis- 
tently with the ripple effect, the RHPI responses show a similar pattern for neigh- 
boring States. The responses seem to be more persistent to UR shocks while the 
effects of a RPCI shock decay more rapidly such that the system appears to ap- 
proach faster to the initial equilibrium conditions. 

A further important advantage of the model formulation is that it enables con- 
sideration of cases in which the temporal series of X are longer than those of Y. 
As noticed in section 7, this was particularly useful to improve the temporal predic- 
tions by conditioning on known values of the predictor providing a set of plausible 
scenarios for RHPI. 

Of course, we acknowledge that other possibilities could be considered for mod- 
eling the spatial structure and an example is provided by Wang and Wall (2003). 
An alternative scheme could also lead to the specification of common factors with a 
spatio-temporal structure. In this case, one may follow the methodology proposed 
in Debarsy, Ertur and LeSage (2012) to quantify dynamic responses over time and 
space as well as spacetime diffusion impacts. 

Finally, in this paper we have focused exclusively on normally distributed data. 
However, nonlinear and non-Gaussian spatio-temporal models have been exten- 
sively used in various areas of science, from epidemiology to meteorology and 
environmental sciences, among others. In this case, assuming the measurements 
belong to the exponential family of distributions, a generalized spatial dynamic 
structural equation model represents a natural extension of the SD-SEM discussed 
here. This extension will be a topic for future work. 

APPENDIX A: COINTEGRATED LATENT FACTORS AND THEIR VECTOR 

ERROR CORRECTION REPRESENTATION 

Let ^{z) denote the characteristic polynomial associated with the vector ECM 
shown in (11) and let c be the number of unit roots of Z?et[$(2)]. Let also that 
rank{A) = r, with r = m + I — c. Then, we assume that the latent exogenous 
variables, {{t), are cointegrated with cointegrating rank ly so that r > rj and 
rf<l. 

Let Q(X]^=i *i)P = J be the Jordan canonical form of XliLi where Q = 
P""*^ an ((m + /) x (m + /)) matrix, J = diagilm.-rj:, A,.^, I^-rj, Ar^. ) and 
Td = r — r f (Ahn and Reinsel, 1990 and Cho, 2010). Because of the exogeneity of 
f (i), the matrices A and are upper block triangular matrices, that is 

A=[^^ t'l and $.= [*^' 

A2 '0 *2i 
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Then, consider the following matrix partition 



with Q[ 

rp(l) p(2)- 



Pi 




Pl2 ■ 


, Q = 


p 1 = 





1 

1 — 






Pi = 





-Pi ^Pl2P2 ^ 
p-1 
^2 





Qi Qi2 " 




Q2 



P?l Qi2 = [QY^ Q'u'l P 
p« pP] 

(2) -0(2) f^^^A T3(l) 



»(i) -pCi) 



»(2)l 



12 



and P2 



Note that Q\ >, P[ ' are (m x (m - r^)), Q\ ' Pi are (m x r^), \ P^^ are 



(Z X {l-rf)),Q^^\ P^^^ are (/ xr/), Q^^ is (/ x (m-r^)), P^^ is (mx (/-r/)), 



(1) 



Q^2'' is ^ '^rf) arid P^2'' is ('^ ^ '''/)• Then, we may write 



,(2) 



P(I-J)Q 



,(2) p{2) 
1 ■'^^12 



P 



(2) 



I- A 



rd 

I- A 



Qf' 



Pf\l - Arjqf - A,JQf )'Pi2Q2 + Pg(I - Ar,)Qf' 

.(2)^T A An{2)' 







P^^^(I-A,.)Q^' 



and equation (11) can thus be rewritten as 



p-i 



(13)Ag(t) = AB'd(t-l)+A2B'/(t-l) + ^KiAd(t-i) + ^(t) 



i=l 



p-1 



(14) Af(t) = A/B'/(t-l) + J^*2iAf(t-j)+77(t) 



i=l 



where A = -Pf)(I - A,J, B = [I - Pi2Q2]'Qi'\ A/ 



(2) 



.(2), 



I- A 



pS5(I - A^,^,), B/ = Q^"^ and = [^u ^ut]- Note that if P12 and 



,(2) 



rfj 



(2) 

P^2 are 0, then a separated cointegrated structure exists for g(i) and f (t). 

Let B = [b; B'2]' where Bi = Qf ^ and B2 = -Q^P'i2Qf \ then A can be 
rewritten as 



AB; AB'2 + A2B'j 
A;B'^ 



Also, let rf 



ranh 



/c(A/B'.), rd = rank{AB'), = ran/c( AB'2 + A2B'.), 



ran/c(AB2) and = ronA;(A2Bj). Then, it follows that if ranA;(AB2 + 



A,B' 



L2-Dj; = no cointegration structure exists between the endogenous and exoge- 
nous processes g(t) and f(t). 
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APPENDIX B: THE SSVS PRIOR FOR THE VECTOR ECM 

Since 11^^ = AB', XI^j = A2BJ and 11^ = AjB'j are not unique, in this paper 
we follow the approach proposed by Jochmann et al. (2011) and Koop et al. (2009) 
to elicit the SSVS priors on the cointegration space. A summary of this approach 
is provided below. 

Specifically, a non-identified x symmetric positive definite matrix E is in- 
troduced with the property, Ilgd = AEE~^B' = AB', where A = AE and 
B = BE~^. The introduction of the non-identified matrix E facilitates poste- 
rior computation because the posterior conditional distributions of A and B in the 
MCMC algorithm are Gaussian (Koop et al., 2009). The same holds analogously 
for Ugf and Uf. 

Let a = vec{A') and p = {pi, . . . , pfh) a parameter vector, where m = mr*^. 
Then, we assume that a|p ~ -/V(0, Vq), where Vq = diag{vi, . . . ^v"^), vf = 
(1 — Pi)vQ^ + Pivf^ and pi, the ith element of p, has a Bernoulli distribution with 
parameter pa, i.e. pi ~ Be{pa). In this paper, we set pa = 0.5, Vq^ = 0.1(j^(aj), 
= 10(7^ (oj), where (T^(aj) is an estimate of the variance of the ith element of 
a obtained from a preliminary MCMC run with a non-informative prior. 
With appropriate notation, the same assumptions hold for aj = vec{Af), with 
Af = Af'Ef, and 3l2 = vec{A2), with A2 = A2EJ. 

The prior for the cointegrated space is defined through b ~ A^(0, 1) and ~ 
iV(0,I), where b = vec(B), b/ = vec{Bf) and B/ = BEj\ The SSVS 
prior for k = i;ec([Ki, . . . ,Kp*„i]') is given by k|5 ~ A^(0,D), where D = 

diag (rf,..., r(2„+^)(p._i)) , = (1 - 6i)T^^ + Jjrfj and d is an unknown vec- 
tor with typical element 6i Be{pr)- Here, we set Pr = 0.5, Tqj = 0.1a^(A;j), 
r^j = lOa^(fcj), cr'^{ki) is an estimate of the variance of the ith element of k ob- 
tained from a preliminary MCMC run using non-informative prior. Analogously, 
we define 4> = vec{[^2i, ■ ■ ■ ,^2p*-i]') and assume that 4>\5^ ~ N{0,'D^), 
where = diag {k^^, . . • , ^^^^(p,,^)), k^. = (1 - 5^i)Kl^^ + ^.^i^^^, and 

5^ is an unknown vector with element ~ Be{p^). Here we set p^ = 0.5, 
K^Qj = O.l(T^(0j), K^j^j = 10(T^((^j), is an estimate of the variance of the 

ith element of (J) obtained from a preliminary MCMC run using non-informative 
prior. 

APPENDIX C: MULTIPLIER ANALYSIS 

If the model contains integrated variables and the generation mechanism is 
started at time t = 0, it readily follows that (Lutkepohl, 2005, p. 402-407) 
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t-i 



(15) 



it) 



5S*g(0) + J^5Q*ef(t - i) + ^5S'3'^(t - i) 



i=0 



1=0 



where 3, B and Q are (^m x (mj5 + ^(mp + /s) x and ^(m]5 + /s) x 
{mp + /s) ) matrices such that 



[I 



B 



' " 











ll 


, Q = 













Ci 







C2 










Cp Di 







1; 








D2 





... 

... 







Then, assuming without loss of generality mx{t) = and iny{t) = 0, it follows 
from the measurement equation (1) that by denoting with nl the pseudo-inverse of 
Ha;, i.e. nt = (Ha,Ha;)~^H^, for m < and Ha,Ha: invertible, the least-square 
estimator of f (t) is f (t) = Hix(t). 

Hence, from equations (2) and (15), it follows that the marginal impact of changes 
of the predictor X(t) on the dependent variable Y(t) can be investigated through 
the coefficient matrices 



Tk = ny2Q!'BYll, A; = 0,1,... 
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Fig 4. Unconditional forecasts (dashed line), conditional forecasts (continuous line) and true data 
(») at the 48 United States; the 95% credible inter\'al limits for the unconditional forecasts are repre- 
sented by dotted lines. The 95% credible interval limits for the conditional forecasts are represented 
by the shaded area. Each subplot also shows the initials of the State. 
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Fig 5. Posterior mean impulse responses (solid line) of RHPI to a RPCI shock in Nevada. The cred- 
ibility intervals at 68% and 90% are represented by shaded areas. The responses are observed in: 
Nevada (NV), Oregon (OR), Arizona (AZ), New Mexico (NM), Utah (UT), Idaho (ID) and California 
(CA). 
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Fig 6. Posterior mean impulse responses (solid line) of RHPI to a UR shock in Florida. The cred- 
ibility intervals at 68% and 90% are represented by shaded areas. The responses are observed in: 
Florida (FL), Tennessee (TN), Alabama (AL), Mississippi (MS), Arkansas (AR), West Virginia (WV), 
North Carolina (NC) and Georgia (GA). 



